Introduction
FungiExpresZ is a web based platform to analyze and visualize fungal gene expression data. It allows you to analyze and visualize …
- NCBI SRA fungal gene expression data.
- User uploaded gene expression data.
- NCBI SRA data combined with User uploaded data (1+2).
It contains normalized gene expression values of more than 12,000 NCBI SRA data from 8 different fungal species and, gene annotations and GO data of more than 100 different fungal species. For 3 strategies mentioned above, you can generate 12 different data exploratory plots and 6 different GO plots.
Data exploratory plots
- Scatter plot
- Multi-scatter plot
- CorrHeatBox
- Density plot
- Histogram
- Joy plot
- Box plot
- Violin plot
- Bar plot
- PCA plot
- Line plot
- Heatmap
GO plots
- EMAP plot
- CNET plot
- Dot plot
- Bar plot
- Heat box
- Upset plot
The purpose of this document is to explain key functionalities and methods implemented in FungiExpresZ.
Getting access
There are three ways in which you can access FungiExpresZ.
Online
FungiExpresZ has been hoisted on shinyapps.io and can be accessed through the link https://cparsania.shinyapps.io/FungiExpresZ/. This is one of the quickest way to access the FungiExpresZ. However, due to limited computational resources, We recommend using this approach only when the size of the data is comparatively small (< 10 MB) and/or you are in a need of quick figure out of the data. Current setup allows approx. 30 concurrent users to access FungiExpresZ online. Additional traffic may disconnect random users’ session and you may end up loosing all analysis performed. Even without access traffic idle session timeout is 30 minutes, and therefore you may loose your analysis if you have thought to continue later. For stable, robust and to have long lasting session it is recommended to use one of the following two approaches.
Run Locally
Use as a docker container
This approach is highly recommended for local run because as a user you do not need to worry about any dependency related issues.
Install docker desktop
Follow the instructions given below to install docker desktop on …
Pull FungiExpresZ docker image to a local computer
Once the docker desktop is installed, next step is to pull the FungiExpresZ’s docker image. Before you pull the image make sure your docker desktop is running. Next, to pull the image, open the terminal and enter the below command.
docker pull cparsania/fungiexpresz:<tagName>
Replace <tagName> with the version you want to download. For example, command below will download the version 1.1.0
docker pull cparsania/fungiexpresz:1.1.0
Possible values for <tagName> can be obtained from here. It is recommended to pull latest available tag.
Run container
After getting the image on local computer, it can be run as a container. The command below will open the port given as <port_number> on local computer and launch the application on same.
docker run -p <port_number>:80 cparsania/fungiexpresz:<tagName>
You can give valid TCP <port_number> which is not occupied by your system (e.g. 3232, 3233, 5434, … etc.).
Successful launch will print standard R welcome message on terminal with the final the line http://0.0.0.0:80.
Run on browser, Finally.!!
After launch, hitting one of these URLs http://localhost:<port_number or http://127.0.0.1:<port_number> or http://<your_ip_address:<port_number> should launch the application on your browser.
Congrats!! 🎉🎉🎉🎉 .Your application will keep running until you stop container explicitly.
Memory usage for docker
Depending upon size of the data you are analyzing, you may need to assign more computational resources to docker than the default which is 2 GB of memory and 4 CPUs on a mac with 32GB memory and 8 CPUs. Default behavior can be changed from Docker -> Preferences -> Advanced
We recommend users to allocate maximum 4 GB of memory to docker before you run FungiExpresZ docker image.
How to stop container
Container will be active until it is explicitly stopped. You can stop container using below command on a new terminal window.
get container id
docker ps docker stop <CONTAINER ID>
Install as an R package
FungiExpresZ can be installed as an R package on local computer or server. To do so basic skills of R programming is required.
Prerequisites
R version (>= 3.6.1)
Installation of FungiExpresZ as an R package is different than usual. To protect potential breakdown of various utilities of FungiExpresZ , it is recommended that FungiExpresZ uses same versions of R packages as development. Steps below will install required versions of dependency packages without affecting already installed packages in your computer.
To keep already installed R packages unaffected in local computer, FungiExpresZ will be installed in a separate directory.
Installation steps
Create a installation directory (
e.g. FungiExpresZ_R_pkg).Download
rlock.envfile from here.
rlock.env file contains all the information required to install required versions of dependency packages.
Move
rlock.envfile to installation directory created in step-1.Download appropriate appropriate package bundle. 👉 Download package bundle here
It is highly recommended to download the latest available version.
Mac : FungiExpresZ_<version>.tgz
Windows: FungiExpresZ_<version>.tar.gz
Move package bundle to installation directory created in step-1.
Install R pacakge
renvfrom terminal.Open terminal to type below commands
Initiate project in a current directory.
Install required versions of dependency packages.
renv.lock file must be in the same directory.Install R package
devtoolsInstall FungiExpresZ.
FungiExpresZ_1.1.0.tar.gz is the path to bundle file downloaded in the step-4.Run FungiExpresZ through installed R package.
Access through browser
Hit the URL printed on the console to browser and you are ready to go 🎉🎉🎉🎉.
Home screen
Once the application loaded fully on a browser, it looks like as shown in the Fig-1.
FIGURE 1: FungiExpresZ home screen
1). Inputs
It allows you to either upload or select data from pre-existing SRA data. Depending upon radio button selected (Select data or Upload/Use example data) submit button toggle between Select and Upload. Click either of these result in a popup which have been explained in Fig-2 and Fig-3.
2). Assign groups
Often genomics and transcriptomics data contain sample groups i.e. replicates, strains, time points etc. and gene groups i.e. differentially expressed genes, genes specific to pathway etc. Comparison between them could reveal similarities and contrasts between these groups, which ultimately leads to unfold meaningful biological insights. Assign groups feature allows you to upload user defined Sample groups and Gene groups. Additional info on file format to assign groups given in the Fig-4 and Fig-5. Once the groups uploaded, you may color, cluster or facet the expression values in different plots according to groups assigned.
3). View active groups
By clicking on View active groups one can check the current active groups (both sample and gene).
4). Usage
It displays the locations across the globe, where FungiExpresZ has been used at least once. Blink sign is the indication of “at this moment” access from particular location. Through the click on anywhere in the map you can get more insights on usage statistics across the globe.
5). - 10).
Number 5-10 are different tabs i.e. App, About, Downloads, News & Updates ,Citations, Contact. Although one can anticipate content of each tab from the name, details on each has been given later in the tutorial. Current selected tab in Fig-1 is App. As you can see, it contains 12 different plot panels (default open is Scatter plot). Each plat panel is explained later in the tutorial.
Select/Upload data
Once you click on Select/Upload data (Fig-1 #2) relevant pop-up will appear.
Select SRA data
FungiExpresZ contains > 13,000 pre-processed NCBI SRA data from 8 different fungal species. The values given are normalized gene expressed values (FPKM / RPKM etc.) Some of these data have been obtained from public resources while remaining ones are processed by us. You can select any of these data for purpose of analysis and visualizations. Once you click on Select data button the pop-up will appear as shown in Fig-2.
FIGURE 2: FungiExpresZ select SRA data
1) Organism
Drop-down Organism allows you to select organism of your choice. Once organism selected table below will show data of selected organism only.
2) Strain and 3) Genotype
Drop-down Strain and Genotype allows you to filter by stain and genotype respectively for selected Organism. Both of these filters works with AND operator and therefore, while applying them together will only work if both condition satisfy. P.S. Current settings doesn’t allow selection of more than one value in each filter.
4) Reset Strain and 5) Reset Genotype
#4 and #5 are reset options for drop-down Strain and drop-down Genotype respectively.
6) Select all rows
Once the data filtered (By organism, strain and genotype) you need to select the row(s) from resultant table to make them available for analysis and visualization. As name suggests click on Select all rows will select all the rows being displayed in the table. You may also use shift key from the keyboard to select more than one rows simultaneously.
7) Copy
Click on Copy button copy data displayed in the table to your clipboard. You may paste them to any spreadsheet like program to better organize and understand.
8) Download
Using button Download you can even download data being displayed in the table to one of these three formats i.e. .csv, .pdf or excel.
9) Column visibility
Click on Column visibility will lists the hidden columns to the table. You may select one or more of them to make them visible in the table being displayed.
10) Search
Besides Organism, Strain and Genotype filters, you can perform free text filter from the text box given under Search title. Input text will be matched against all the column being displayed in the table and matching rows will be displayed as a result.
11) Clear all
Click on Clear all will deselect rows if selected any.
12) Submit
Once the rows selected hitting button Submit will make selected data available for analysis. For selected data, SRA id will be displayed as sample identity in each plot panel.
Upload user data
Often transcriptomics data found in a tabular like format where columns are samples i.e. replicates, time-points, multiple strain types etc. and rows are genes. Each cell in a table contains normalized gene expression values. You can either upload such a tabular format data in .txt file format or paste in a text box to FungiExpresZ for the purpose of analysis and visualization. Once you click on the button Upload (Fig-1 #2), pop-up showed in the Fig-3 will appear.
FIGURE 3: FungiExpresZ upload user data
1) Upload example data
Upon selection of this check-box example data will be activated.
2) Upload data
This section allows you to upload your own data. As mentioned above you may either choose uploading tabular format data in .txt file or paste data in a given text box. In both cases column names and row names are necessary requirements. Later, while analyzing data column names will appear as a sample identity in each plot panel, while row names will be used in background to fetch organism’s annotations (gene start, gene end, gene strand,gene description, GO terms etc.).
3) Select column separator
Selection of correct column separator for the uploaded data is required to upload data successfully. Default is tab. You may also select comma or semicolon.
4) Select species
Selection of a species is optional. However, correct species selection is required to perform gene annotations and GO analysis. Once the species selected is done, in background FungiExpresZ matches row names of uploaded data to the database id of selected species. For the selected species, you can cross check your id with the database id from the example database id given below the species selection drop-down menu.
5) Log transformation
Due to wide range of RPKM / FPKM / TPM values, often, they need to be log transformed before visualizing the data. Once the data uploaded, FungiExpresZ allows you to log transform (log2 or log10) uploaded data. To avoid NAs from the log transformation of 0s, FungiExpresZ adds constant 1 to all the uploaded values.
6) Join data
FungiExpresZ allows you to perform combined analysis of uploaded data with pre-existing NCBI-SRA data. To use this functionality, you first need to select the pre-existing NCBI-SRA data of you interest from the FungiExpresZ. To know more about how to select SRA data refer the section Select SRA data (Fig-2). Next step is to upload your data, which you want to join with selected NCBI-SRA data. Once both these steps done, you can select Join data option to merge both of these data in background. For successful execution of join data operation row names of uploaded data must match to the database id of a selected species.
7) Submit
Click on Submit will lock all inputs made above and data will be available for the analysis and visualizations.
Assign groups
Often genomics and transcriptomics data contain sample groups i.e. replicates, strains, time points etc. and gene groups i.e. differentially expressed genes, genes specific to pathway etc. Comparison between them could reveal similarities and contrasts between these groups, which ultimately leads to unfold meaningful biological insights. Assign groups feature allows you to upload user defined Sample groups and Gene groups. In here, we discussed about technicalities of group assignments.
FIGURE 4: FungiExpresZ define sample groups.
Sample groups
Click on the button Sample groups will open up the pop-up showed in the Fig-4. There are three different ways by which you can assign the sample groups for your data.
1) Manual
Using this option you can assign maximum of two sample groups. Under the section Group name you can input unique name to each group; default group names are Group_1 and Group_2. Under the drop-down of Group members you can select the samples from active data and assign them to one of the two groups. Each sample must be assigned to one of the two groups.
2) Upload
Second option to assign the sample groups is via user data upload. You can either upload groups via .txt file or paste data in the given text box. Both of these ways require identical data format i.e. a matrix of two columns, where first column contains group name and second column contains group members. A tab, a semicolon or a comma can be used as column deliminator. First row of the matrix will be considered as column names, which you can give of your choice. Each group member (column 2) must be assigned to the single group. Group members (column 2), which are used as sample names in the uploaded data will only be used to assign the groups and rest will be discarded.
3) Group by BioProject(NCBI)
While analyzing the samples from NCBI-SRA data alone or along with user uploaded data, it is important to know which are the SRA samples from same study and which are from different. In background, FungiExpresZ clusters NCBI-SRA samples by NCBI-BioProjectID. You can activate sample groups by NCBI-BioProjectID by click on button Submit under this panel.
6) Submit
Click on the submit button will activate the assigned groups. It is important to note that every time you change the gene expression data, group data will be lost although the groups are same. To reactivate the same groups you need to click Submit again or upload groups in case of different groups.
Gene groups
Click on the button Gene groups will open up the pop-up showed in the Fig-5. You can either upload file or paste data in a given text box to assign the gene group.
FIGURE 5: FungiExpresZ define gene groups.
1) Upload
To assign the gene groups, you can either upload groups via .txt file or paste data in the given text box. Both of these ways require identical data format i.e. a matrix of two columns, where first column contains group name and second column contains group members. First row of the matrix will be considered as column names, which you can give of your choice. Each group member (column 2) must be assigned to the single group. Group members (column 2), which are used as gene names in the uploaded data will only be used to assign the groups and rest will be discarded.
2) Column separator
While uploading the groups you can use a tab or a semicolon or a comma as a column deliminator.
3) Example data snap
The snap shot showing the format for gene groups.
Plot panels
FungiExpresZ allows you to generate 12 data exploratory plots, which are 1) Scatter plot, 2) Multi-scatter plot, 3) CorrHeatBox, 4) Density plot, 5) Histogram, 6) Joy plot, 7) Box plot, 8) Violin plot, 9) Bar plot, 10) PCA plot, 11) Line plot and 12) Heatmap. Each of these have independent panel containing necessary inputs, plot output and other plot settings. Below sections have discussed each plot panel and their options in detail.
Plot input panels
Scatter plot
Fig-6 shows the scatter plot input panel.
FIGURE 6: FungiExpresZ Scatter plot input panel.
1) Select sample (X-axis)
Select sample which is to be shown on the axis-X in a scatter plot.
2) Select sample (Y-axis)
Select sample which is to be shown on the axis-Y in a scatter plot.
3) Select gene groups
By default, all the observations / genes will be displayed in a scatter plot. Optionally, selecting gene group(s) allow you to show group specific observations / genes in a scatter plot.
4) Plot
Hitting a button ‘Plot’ will open up a plot panel containing resultant scatter plot and other plot settings.
Multi-scatter plot, CorrHeatBox, Density plot, Histogram, Joy plot, Box plot, Violin plot and PCA plot
Multi-scatter plot, CorrHeatBox, Density plot, Histogram, Joy plot, Box plot, Violin plot and PCA plot require same inputs, which are shown in Fig-7.
FIGURE 7: FungiExpresZ Multi-scatter plot, CorrHeatBox, Density plot, Histogram, Joy plot, Box plot. Violin plot and PCA plot input panel.
1) Select sample(s)
You can select one or more samples from the drop down ‘Select samples’. All selected samples will be displayed in the resultant plot.
2) Select gene group(s)
By default, all the observations / genes will be displayed in the resultant plots. Optionally, selecting one or more gene group(s) allow you to show group specific observations / genes set in the output plot.
3) Plot
Hitting a button ‘Plot’ will open up a plot panel containing resultant plot and other plot settings.
Bar plot inputs
The purpose of the bar plot given here is to check the expression of individual gene(s) in multiple samples or sample groups. Bar plot input panel is shown in the Fig-8.
FIGURE 8:FungiExpresZ Bar plot input panel.
1) Select sample(s)
You can select one or more samples from the drop down ‘Select samples’. All selected samples will be displayed in the resultant plot.
2) Select gene(s)
Instead of the gene groups like in other plots, in bar plot you can select one or more genes which will be displayed in output bar plot.
3) Plot
Hitting a button ‘Plot’ will open up a plot panel containing resultant bar plot and other plot settings.
Line plot inputs
Line plot is a powerful way to show the trends of observations / genes across multiple samples. For example, one of the ways to use this plot is to show expression of genes across several time point samples. FungiExpresZ also allows to cluster observations / genes both unsupervised and supervised way, and simultaneous visualization of clustered data. Additionally, you can even display average line (mean or median) instead of individual line of each gene / observation in each cluster. The input panel of the line plot has been shown in the Fig-9.
FIGURE 9: FungiExpresZ Line plot input panel
1) Select sample(s)
You can select one or more samples from the drop down ‘Select samples’. All selected samples will be displayed in the resultant plot.
2) Select gene group(s)
By default, all the observations / genes will be displayed in the resultant plot. Optionally, selecting one or more gene group(s) allow you to show group specific observations / genes set in the output plot.
3) Genes to plot
This option provide additional filter on top of genes selected under option #2. You can choose between display ‘# top variable genes (By standard deviation)’ or display ‘All genes’.
- # top variable genes (By standard deviation)
Selecting this option will plot number of genes specified in the given numeric input. It calculates standard deviation of each observation / gene across selected samples in #1 and ranks them from high to low by standard deviation to select and plot number of specified genes / observations.
- All genes
Selecting this option will plot all the genes filtered by #2.
4) # top variable genes to show
Numeric input is required if the option ‘# top variable genes (By standard deviation)’ is selected. Input number will be used to select the top variable genes ranked by standard deviation as described in the previous section.
5) Gene cluster
You can cluster observation / genes either by un-supervised (‘K-means’) or supervised (‘Gene groups’) way . Simultaneously each cluster can be visualized in the resultant line plot.
- K-means
K-means clustering is one of the popular methods to perform the un-supervised clustering of gene expression data. While doing un-supervised clustering number of clusters are not known prior to the clustering. Before you do the clustering, number of clusters need to be specified in which you want to group the data. To perform the k-means clustering, FungiExpresZ uses the function stats::kmeans() with all default parameters.
- Gene groups
You can also perform supervised clustering of the genes/observations if prior cluster information is provided as gene groups. As mentioned earlier (In the section Assign groups), you can assign gene groups to your data. The same gene groups can be used here to cluster the genes/observations and simultaneously visualized in the line plot.
6) # of clusters (K-means)
For K-means clustering, number of clusters in which data needs to be grouped.
7) Cluster by
You can choose either Z-score or Raw values to cluster the genes/observations.
- Raw value
When selecting the option ‘Raw value’, user uploaded values will be used to cluster the genes / observations.
- Z-score
When selecting the option ‘Z-score’, FungiExpresZ will use the Z-score calculated from raw values for each observation/gene across selected samples. To calculate the Z-score FungiExpresZ uses the R function base::scale() with all default parameters.
8) Display value
Likewise parameter ‘Cluster by’, you can use parameter ‘Display value’ value to choose which value to be displayed in the plot regardless of value selected in ‘Cluster by’.
- Raw value
When selecting the option ‘Raw value’, user uploaded values will be displayed in the output line plot.
- Z-score
When selecting the option ‘Z-score’, Z-score calcualted for each gene across selected samples will be displayed in the output line plot.
9) Display lines for
Under this parameter, you can choose whether to display individual line for each gene in each cluster or single line showing average of all genes for each cluster in output line plot.
- Individual gene
Selecting this option will display individual line for each gene in each cluster.
- Average of gene
Selecting this option will display average line (mean or median) for all genes in each cluster.
10) Plot
Hitting a button ‘Plot’ will open up a plot panel containing resultant line plot and other plot settings.
Heatmap inputs
Heatmap is a very popular way to represent various genomics or transcriptomics data. Very often, heatmap is used to reveal hidden patterns from gene expression data. FungiExpresZ implements one of the very powerful R packages i.e. ComplexHeatmap to create the heatmap. To get most out of the data, there are several Row, Column and Legend specific options provided, which give users lots of flexibility while creating a heatmap. Input panel for the heatmap has been shown in the Fig-10.
FIGURE 10: FungiExpresZ Heatmap inputs
- Select sample(s)
You can select one or more samples from the drop down ‘Select samples’. All selected samples will be displayed in the output heatmap plot.
- Select gene groups(s)
By default, all the observations / genes will be displayed in the output heatmap plot. Optionally, selecting one or more gene group(s) allow you to show group specific observations / genes set in the output plot.
- Number of genes to plot
By default, FungiEpxresZ plots heatmap of top 500 most variable genes. However, you can change the number of genes to be shown under the input # of top variable genes to show. The selected genes will be from the remaining genes once the genes group(s)(#2) filter is applied. To select the top variable genes, FungiEpxresZ uses the standard deviation that is calculated for each gene across selected samples. Higher the standard deviation more the variability and vice versa. You can also choose the option All genes to display all the genes remained after gene group(s) filter if applied.
- # of top variable genes to show
A number to display top variable genes in the heatmap. To select the top variable genes, FungiEpxresZ uses the standard deviation that is calculated for each gene across selected samples.
- Cluster by
You can choose either Z-score or Raw values to cluster the genes/observations.
- Display value
Likewise parameter ‘Cluster by’, you can use parameter ‘Display value’ value to choose which value to be displayed in the heatmap plot regardless of value selected in ‘Cluster by’.
- Row options
- 7A. Row names
- 7B. Row names font size
- 7C. Row cluster
- 7D. # row clusters
- 7E. Row cluster label prefix
- 7F. Row cluster (within the cluster)
- 7G. Row dendogram (within the cluster)
- 7H. Row cluster border (within the cluster)
- 7I. Add standard deviation heatmap (within the cluster)
- 7J. Sort by standard deviation
8) Column options
8A. Column names
8B. Column names font size
8C. Column cluster
8D. # column clusters(k-means)
8E. Column cluster label prefix
8F. Column cluster (within the cluster)
8G. Column dendogram
8H. Column annotation
8I. Column annotation height
9) Legend options
9A.
9B.
9C.
9D.
9E.
10) Plot
Plot settings
Common plot settings
Plot specific settings
Scatter plot
// TO DO
FIGURE 12: FungiExpresZ Scatter plot advance options.
Multi-scatter plot
// TO DO
FIGURE 13: FungiExpresZ Multi-scatter plot advance options.
CorrHeatBox
// TO DO
FIGURE 14: FungiExpresZ Corr heat-box advance options.
Density plot
// TO DO
FIGURE 15: FungiExpresZ Density plot advance options.
Histogram
// TO DO
FIGURE 16: FungiExpresZ Histogram advance options.
Joy plot
// TO DO
FIGURE 17: FungiExpresZ Joy plot advance options.
Box plot
// TO DO
FIGURE 18: FungiExpresZ Box plot advance options.
Violin plot
// TO DO
FIGURE 19: FungiExpresZ Violin plot advance options.
Bar plot
// TO DO
FIGURE 20: FungiExpresZ Bar plot advance options
PCA plot
// TO DO
FIGURE 21: FungiExpresZ PCA plot advance options
Line plot
// TO DO
FIGURE 22: FungiExpresZ Line plot advance options
Heatmap
// TO DO
GO analysis and visualizations
// TO DO
FIGURE 24: FungiExpresZ GO analysis inputs
GO plots specific settings
// TO DO
Dot plot and Bar plot
// TO DO
FIGURE 25: FungiExpresZ GO Dot plot and Bar plot advance options
EMAP plot
// TO DO
FIGURE 26: FungiExpresZ GO EMAP plot advance options
CNET plot
// TO DO
FIGURE 27: FungiExpresZ GO CNET plot advance options
UPSET plot
// TO DO
FIGURE 28: FungiExpresZ GO Upset plot advance options
Heat plot
// TO DO
FIGURE 29: FungiExpresZ GO Heat plot advance options
Other panels
// TO DO
About
// TO DO
FIGURE 30: FungiExpresZ `Overview` page
FIGURE 31: FungiExpresZ `Tutorial` page
Downloads
// TO DO
FIGURE 32: FungiExpresZ `Downloads –> Gene expression data` page
FIGURE 32: FungiExpresZ `Download –> GO data` page
News & Updates
// TO DO
FIGURE 33: FungiExpresZ `News & Updates` page
Citations
Citation page list all the paper citing FungiExpresZ.
FIGURE 34: FungiExpresZ `Citations` page